# Zero-Shot Classification
Clip Vitl14 Test Time Registers
MIT
Based on the OpenCLIP-ViT-L-14 model, the Test-Time Register technology is introduced to improve the model's interpretability and the performance of downstream tasks.
Text-to-Image
Transformers

C
amildravid4292
236
0
Sail Clip Hendrix 10epochs
A vision-language model fine-tuned from openai/clip-vit-large-patch14, trained for 10 epochs
Text-to-Image
Transformers

S
cringgaard
49
0
Git RSCLIP
Apache-2.0
Git-RSCLIP is a vision-language model pretrained on the Git-10M dataset, specializing in multimodal understanding of remote sensing images.
Text-to-Image
Safetensors
G
lcybuaa
59.37k
4
Thesis Clip Geoloc Continent
A CLIP-ViT model optimized for image geolocation recognition, fine-tuned for continent-level queries.
Image-to-Text
Transformers English

T
jrheiner
82
0
Git Base
MIT
GIT is a dual-conditional Transformer decoder based on CLIP image tokens and text tokens, designed for image-to-text generation tasks.
Image-to-Text
Transformers Supports Multiple Languages

G
microsoft
365.74k
93
Taiyi CLIP RoBERTa 326M ViT H Chinese
Apache-2.0
The first open-source Chinese CLIP model, pre-trained on 123 million image-text pairs, with RoBERTa-large architecture as the text encoder.
Text-to-Image
Transformers Chinese

T
IDEA-CCNL
108
10
Japanese Cloob Vit B 16
Apache-2.0
Japanese CLOOB (Contrastive Leave-One-Out Boost) model trained by rinna Co., Ltd. for cross-modal understanding of images and text
Text-to-Image
Transformers Japanese

J
rinna
229.51k
12
Featured Recommended AI Models